Mining web log datasets has been extensively studied using Frequent Pattern Mining (FPM) and its various other forms. Identifying\nfrequent patterns in different sequences can help in analyzing the most common sub-sequences (e.g., the pages visited\ntogether). However, this approach would not be able to identify general structures spanning over multiple sequences. In response\nto understanding general structures, we introduce a new form of sequential pattern mining called super-sequence frequent pattern\nmining (SS-FPM). In contrast to sub-sequences determined by FPM, SS-FPM determines the super-sequences that can contain\nthe common parts from different sequences. This can be useful in understanding the general behavior/flow of users in web usage\nmining, classifying web pages and users, making predictions etc. In essence, finding frequent super-sequence patterns turns\nout to be related to the well-known heaviest (longest) path problem in graphs, which is known to be NP-hard. Accordingly,\nwe transform a given sequential dataset into a sequence graph and formulate the problem as k-hop heaviest path problem. We\nthen propose an efficient heuristic called sequence matrix method using dynamic programming techniques. We compared our\nmethod to the existing Heavypath method. The results show that our method is more efficient especially on large datasets.
Loading....